Odds Ratio: Interpretation + Hypothesis Test (2×2 tables)#

The odds ratio (OR) is a measure of association between two binary variables (e.g., treatment vs control and event vs no event).

This notebook focuses on:

  • what odds and odds ratios mean (and what they do not mean)

  • how to test the null hypothesis OR = 1 (“no association”)

  • a low-level NumPy implementation (no stats libraries)

  • an exact alternative (Fisher’s exact test) for small samples

import math
import platform
import sys

import numpy as np

import plotly
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots

pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")

rng = np.random.default_rng(42)

print("Python:", sys.version.split()[0])
print("Platform:", platform.platform())
print("NumPy:", np.__version__)
print("Plotly:", plotly.__version__)
Python: 3.12.9
Platform: Linux-6.8.0-90-generic-x86_64-with-glibc2.35
NumPy: 1.26.2
Plotly: 6.5.2

1) Odds vs probability#

A probability \(p\) lives in \([0,1]\).

The corresponding odds are:

\[ ext{odds}(p) = rac{p}{1-p}\]

Interpretation:

  • odds = 1 means “event” and “no event” are equally likely (\(p=0.5\))

  • odds = 3 means the event is 3× as likely as the non-event

The odds ratio compares odds between two groups:

\[ ext{OR} = rac{ ext{odds}_1}{ ext{odds}_0}\]

Important: an OR multiplies odds, not probabilities. The same OR can imply very different probability changes depending on the baseline probability.

# Visualize probability -> odds and probability -> log-odds (logit)

p = np.linspace(0.001, 0.999, 800)
odds = p / (1.0 - p)
log_odds = np.log(odds)

fig = make_subplots(
    rows=1,
    cols=2,
    subplot_titles=("Odds = p/(1-p)", "Log-odds = log(p/(1-p))"),
)

fig.add_trace(go.Scatter(x=p, y=odds, mode="lines", name="odds"), row=1, col=1)
fig.update_yaxes(type="log", title_text="odds (log scale)", row=1, col=1)
fig.update_xaxes(title_text="probability p", row=1, col=1)

fig.add_trace(go.Scatter(x=p, y=log_odds, mode="lines", name="log-odds"), row=1, col=2)
fig.update_yaxes(title_text="log-odds", row=1, col=2)
fig.update_xaxes(title_text="probability p", row=1, col=2)

# Helpful reference line: p=0.5 -> odds=1 -> log-odds=0
for col in [1, 2]:
    fig.add_vline(x=0.5, line_dash="dash", line_color="gray", row=1, col=col)

fig.update_layout(title="Odds are an unbounded re-parameterization of probability", showlegend=False, height=420)
fig.show()

2) Odds ratio from a 2×2 table#

For two binary variables (e.g., Exposure and Outcome), summarize counts in a 2×2 table:

Outcome = 1

Outcome = 0

Exposure = 1

\(a\)

\(b\)

Exposure = 0

\(c\)

\(d\)

Group odds:

\[ ext{odds}_1 = rac{a}{b},\qquad ext{odds}_0 = rac{c}{d}\]

Odds ratio:

\[ ext{OR} = rac{a/b}{c/d} = rac{ad}{bc}\]
  • OR = 1: equal odds (no association)

  • OR > 1: higher odds of outcome in the exposed group

  • OR < 1: lower odds of outcome in the exposed group

If any of \(a,b,c,d\) are 0, the raw OR can be 0 or \(\infty\). A common small-sample fix is the Haldane–Anscombe correction: add 0.5 to every cell before computing OR and standard errors.

# Example 2×2 table
# Rows: [exposed, unexposed]
# Cols: [event, no event]

table = np.array([[30, 70],
                  [15, 85]], dtype=float)

a, b = table[0]
c, d = table[1]

p1 = a / (a + b)
p0 = c / (c + d)

odds1 = a / b
odds0 = c / d
OR = odds1 / odds0

print(f"P(event | exposed)   = {p1:.3f}")
print(f"P(event | unexposed) = {p0:.3f}")
print(f"odds(exposed)        = {odds1:.3f}")
print(f"odds(unexposed)      = {odds0:.3f}")
print(f"odds ratio (OR)      = {OR:.3f}")

fig = go.Figure(
    data=go.Heatmap(
        z=table,
        x=["Event", "No event"],
        y=["Exposed", "Unexposed"],
        colorscale="Blues",
        text=table.astype(int),
        texttemplate="%{text}",
        hovertemplate="%{y} / %{x}<br>count=%{z}<extra></extra>",
    )
)
fig.update_layout(title="Observed 2×2 counts", height=360)
fig.show()
P(event | exposed)   = 0.300
P(event | unexposed) = 0.150
odds(exposed)        = 0.429
odds(unexposed)      = 0.176
odds ratio (OR)      = 2.429

3) What hypothesis is being tested?#

In a 2×2 table, the most common null hypothesis is:

\[H_0: ext{OR} = 1\]

This is equivalent to independence between exposure and outcome (no association).

A typical workflow:

  1. summarize your data as a 2×2 table

  2. compute an estimate of OR (effect size)

  3. compute a p-value and/or confidence interval for OR

Two common ways to test \(H_0\):

  • Large-sample Wald test using \(\log(\widehat{ ext{OR}})\) (fast, approximate)

  • Fisher’s exact test (exact under fixed margins; better for small counts)

Large-sample Wald test on the log-odds-ratio#

We test on the log scale because OR is positive and skewed, while \(\log( ext{OR})\) is often close to normal.

Estimate:

\[\widehat{\log( ext{OR})} = \log\left( rac{ad}{bc} ight)\]

Approximate standard error:

\[\mathrm{SE}ig(\widehat{\log( ext{OR})}ig) = \sqrt{ rac{1}{a}+ rac{1}{b}+ rac{1}{c}+ rac{1}{d}}\]

Test statistic (approximately standard normal under \(H_0\)):

\[z = rac{\widehat{\log( ext{OR})}}{\mathrm{SE}}\]

Two-sided p-value:

\[p = 2\,ig(1-\Phi(|z|)ig)\]

A \((1-lpha)\) confidence interval for OR:

\[\exp\left(\widehat{\log( ext{OR})} \pm z_{1-lpha/2}\,\mathrm{SE} ight)\]

Notes:

  • This approximation is usually OK when all cells are reasonably large (rule of thumb: ~5+).

  • If any cell is very small (or 0), prefer Fisher’s exact test and/or use a continuity correction.

# --- Normal CDF/PPF approximations (NumPy-only) ---

SQRT_2PI = np.sqrt(2.0 * np.pi)


def normal_pdf(x: np.ndarray) -> np.ndarray:
    x = np.asarray(x, dtype=float)
    return np.exp(-0.5 * x * x) / SQRT_2PI


def normal_cdf(x: np.ndarray) -> np.ndarray:
    # Approximate Φ(x) using a classic 5th-order rational approximation.
    x = np.asarray(x, dtype=float)

    # Abramowitz & Stegun style approximation
    p = 0.2316419
    b1 = 0.319381530
    b2 = -0.356563782
    b3 = 1.781477937
    b4 = -1.821255978
    b5 = 1.330274429

    ax = np.abs(x)
    t = 1.0 / (1.0 + p * ax)
    poly = (((((b5 * t + b4) * t + b3) * t + b2) * t + b1) * t)

    cdf_pos = 1.0 - normal_pdf(ax) * poly
    return np.where(x >= 0.0, cdf_pos, 1.0 - cdf_pos)


def normal_ppf(p: np.ndarray) -> np.ndarray:
    # Approximate Φ^{-1}(p) (Acklam's approximation).
    p = np.asarray(p, dtype=float)
    if np.any((p <= 0.0) | (p >= 1.0)):
        raise ValueError("p must be in (0, 1)")

    # Coefficients in rational approximations
    a = np.array([
        -3.969683028665376e01,
        2.209460984245205e02,
        -2.759285104469687e02,
        1.383577518672690e02,
        -3.066479806614716e01,
        2.506628277459239e00,
    ])
    b = np.array([
        -5.447609879822406e01,
        1.615858368580409e02,
        -1.556989798598866e02,
        6.680131188771972e01,
        -1.328068155288572e01,
    ])
    c = np.array([
        -7.784894002430293e-03,
        -3.223964580411365e-01,
        -2.400758277161838e00,
        -2.549732539343734e00,
        4.374664141464968e00,
        2.938163982698783e00,
    ])
    d = np.array([
        7.784695709041462e-03,
        3.224671290700398e-01,
        2.445134137142996e00,
        3.754408661907416e00,
    ])

    plow = 0.02425
    phigh = 1.0 - plow

    x = np.empty_like(p)

    # Lower region
    m = p < plow
    q = np.sqrt(-2.0 * np.log(p[m]))
    x[m] = -(
        (((((c[0] * q + c[1]) * q + c[2]) * q + c[3]) * q + c[4]) * q + c[5])
        / ((((d[0] * q + d[1]) * q + d[2]) * q + d[3]) * q + 1.0)
    )

    # Upper region
    m = p > phigh
    q = np.sqrt(-2.0 * np.log(1.0 - p[m]))
    x[m] = (
        (((((c[0] * q + c[1]) * q + c[2]) * q + c[3]) * q + c[4]) * q + c[5])
        / ((((d[0] * q + d[1]) * q + d[2]) * q + d[3]) * q + 1.0)
    )

    # Central region
    m = (~(p < plow)) & (~(p > phigh))
    q = p[m] - 0.5
    r = q * q
    x[m] = (
        (((((a[0] * r + a[1]) * r + a[2]) * r + a[3]) * r + a[4]) * r + a[5]) * q
        / (((((b[0] * r + b[1]) * r + b[2]) * r + b[3]) * r + b[4]) * r + 1.0)
    )

    return x


def odds_ratio_wald_test(
    table_2x2: np.ndarray,
    correction: float = 0.0,
    alternative: str = "two-sided",
    confidence_level: float = 0.95,
):
    # Wald test and CI for OR in a 2×2 table.
    table_2x2 = np.asarray(table_2x2, dtype=float)
    if table_2x2.shape != (2, 2):
        raise ValueError("table_2x2 must have shape (2, 2)")

    a, b = table_2x2[0] + correction
    c, d = table_2x2[1] + correction

    log_or = np.log(a) + np.log(d) - np.log(b) - np.log(c)
    se = np.sqrt(1.0 / a + 1.0 / b + 1.0 / c + 1.0 / d)

    z = log_or / se

    if alternative == "two-sided":
        p_value = 2.0 * (1.0 - normal_cdf(np.abs(z)))
    elif alternative == "greater":
        p_value = 1.0 - normal_cdf(z)
    elif alternative == "less":
        p_value = normal_cdf(z)
    else:
        raise ValueError("alternative must be: 'two-sided', 'greater', or 'less'")

    alpha = 1.0 - confidence_level
    zcrit = normal_ppf(1.0 - alpha / 2.0)
    ci_log = (log_or - zcrit * se, log_or + zcrit * se)
    ci_or = (float(np.exp(ci_log[0])), float(np.exp(ci_log[1])))

    return {
        'odds_ratio': float(np.exp(log_or)),
        'log_odds_ratio': float(log_or),
        'se_log_odds_ratio': float(se),
        'z': float(z),
        'p_value': float(p_value),
        'confidence_level': float(confidence_level),
        'ci_or': ci_or,
    }


res = odds_ratio_wald_test(table, correction=0.0)
res_cc = odds_ratio_wald_test(table, correction=0.5)

print("Wald test (no correction)")
print(res)
print("
Wald test (Haldane–Anscombe correction = 0.5)")
print(res_cc)
  Cell In[4], line 150
    print("
          ^
SyntaxError: unterminated string literal (detected at line 150)
# Visualize the z statistic on a standard normal curve

z = res["z"]
zabs = abs(z)

x = np.linspace(-4.0, 4.0, 1200)
y = normal_pdf(x)

fig = go.Figure()
fig.add_trace(go.Scatter(x=x, y=y, mode="lines", name="N(0,1) pdf"))

# Shade the two tails beyond ±|z|
left = x <= -zabs
right = x >= zabs

fig.add_trace(go.Scatter(
    x=x[left],
    y=y[left],
    mode="lines",
    line=dict(color="rgba(214,39,40,0.0)"),
    fill="tozeroy",
    fillcolor="rgba(214,39,40,0.25)",
    showlegend=False,
))
fig.add_trace(go.Scatter(
    x=x[right],
    y=y[right],
    mode="lines",
    line=dict(color="rgba(214,39,40,0.0)"),
    fill="tozeroy",
    fillcolor="rgba(214,39,40,0.25)",
    showlegend=False,
))

fig.add_vline(x=-zabs, line_dash="dash", line_color="firebrick")
fig.add_vline(x= zabs, line_dash="dash", line_color="firebrick")

fig.update_layout(
    title=f"Wald test: z = {z:.3f}, two-sided p = {res['p_value']:.4f}",
    xaxis_title="z",
    yaxis_title="density",
    height=420,
)
fig.show()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[5], line 3
      1 # Visualize the z statistic on a standard normal curve
----> 3 z = res["z"]
      4 zabs = abs(z)
      6 x = np.linspace(-4.0, 4.0, 1200)

NameError: name 'res' is not defined

A quick simulation check (why z ≈ N(0,1) under H0)#

Under \(H_0\) (no association), if sample sizes are large enough, the Wald statistic behaves like a standard normal.

We’ll simulate many 2×2 tables under independence and look at the distribution of the resulting z-statistics.

def simulate_tables_under_h0(
    n_exposed: int,
    n_unexposed: int,
    p_event: float,
    n_sims: int = 30_000,
):
    # Simulate 2×2 tables where event is independent of exposure.
    a = rng.binomial(n_exposed, p_event, size=n_sims)
    c = rng.binomial(n_unexposed, p_event, size=n_sims)

    b = n_exposed - a
    d = n_unexposed - c

    tables = np.stack([np.stack([a, b], axis=1), np.stack([c, d], axis=1)], axis=1)
    return tables


def wald_z_for_tables(tables: np.ndarray, correction: float = 0.5) -> np.ndarray:
    tables = np.asarray(tables, dtype=float)
    a = tables[:, 0, 0] + correction
    b = tables[:, 0, 1] + correction
    c = tables[:, 1, 0] + correction
    d = tables[:, 1, 1] + correction

    log_or = np.log(a) + np.log(d) - np.log(b) - np.log(c)
    se = np.sqrt(1.0 / a + 1.0 / b + 1.0 / c + 1.0 / d)
    return log_or / se


tables_h0 = simulate_tables_under_h0(n_exposed=200, n_unexposed=200, p_event=0.2, n_sims=30_000)
zs = wald_z_for_tables(tables_h0, correction=0.5)

fig = px.histogram(zs, nbins=80, histnorm="probability density", opacity=0.7)

x = np.linspace(-4, 4, 600)
fig.add_trace(go.Scatter(x=x, y=normal_pdf(x), mode="lines", name="N(0,1) pdf"))

fig.update_layout(
    title="Simulated z-statistics under H0 (independence)",
    xaxis_title="z",
    yaxis_title="density",
    height=420,
)
fig.show()

print("mean(z):", float(np.mean(zs)))
print("std(z): ", float(np.std(zs)))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[6], line 36
     33 fig = px.histogram(zs, nbins=80, histnorm="probability density", opacity=0.7)
     35 x = np.linspace(-4, 4, 600)
---> 36 fig.add_trace(go.Scatter(x=x, y=normal_pdf(x), mode="lines", name="N(0,1) pdf"))
     38 fig.update_layout(
     39     title="Simulated z-statistics under H0 (independence)",
     40     xaxis_title="z",
     41     yaxis_title="density",
     42     height=420,
     43 )
     44 fig.show()

NameError: name 'normal_pdf' is not defined

4) Exact alternative: Fisher’s exact test#

When counts are small (or a cell is 0), the Wald test can be inaccurate.

Fisher’s exact test computes an exact p-value for independence in a 2×2 table conditional on the row and column totals.

Under \(H_0\) and fixed margins, the top-left cell \(a\) follows a hypergeometric distribution. The p-value is computed by summing probabilities of tables at least as “surprising” as the observed one.

We’ll implement a standard two-sided Fisher p-value by summing probabilities of all tables with probability ≤ the observed table’s probability.

LGAMMA = np.vectorize(math.lgamma, otypes=[float])


def log_choose(n: int, k: np.ndarray) -> np.ndarray:
    k = np.asarray(k, dtype=float)

    # log(n!) = lgamma(n+1) is stable for large n
    return LGAMMA(n + 1.0) - LGAMMA(k + 1.0) - LGAMMA(n - k + 1.0)


def fisher_exact(table_2x2: np.ndarray, alternative: str = "two-sided"):
    table_2x2 = np.asarray(table_2x2, dtype=int)
    if table_2x2.shape != (2, 2):
        raise ValueError("table_2x2 must have shape (2, 2)")

    a_obs, b_obs = table_2x2[0]
    c_obs, d_obs = table_2x2[1]

    r1 = a_obs + b_obs
    r2 = c_obs + d_obs
    c1 = a_obs + c_obs
    c2 = b_obs + d_obs
    n = r1 + r2

    a_min = max(0, c1 - r2)
    a_max = min(r1, c1)

    a_vals = np.arange(a_min, a_max + 1)

    # Hypergeometric probabilities with fixed margins
    log_p = log_choose(c1, a_vals) + log_choose(c2, r1 - a_vals) - log_choose(n, r1)

    # Stabilize before exponentiating
    log_p -= np.max(log_p)
    p = np.exp(log_p)
    p /= np.sum(p)

    # Observed probability
    p_obs = float(p[a_vals == a_obs][0])

    if alternative == "two-sided":
        p_value = float(np.sum(p[p <= p_obs + 1e-15]))
    elif alternative == "less":
        p_value = float(np.sum(p[a_vals <= a_obs]))
    elif alternative == "greater":
        p_value = float(np.sum(p[a_vals >= a_obs]))
    else:
        raise ValueError("alternative must be: 'two-sided', 'greater', or 'less'")

    return {
        'a_support': a_vals,
        'pmf': p,
        'p_value': p_value,
        'p_observed_table': p_obs,
    }


fisher = fisher_exact(table, alternative="two-sided")
print("Fisher two-sided p-value:", fisher["p_value"])

# Visualize the exact null distribution of 'a' given the margins
colors = np.where(fisher["pmf"] <= fisher["p_observed_table"] + 1e-15, "crimson", "steelblue")

fig = go.Figure()
fig.add_trace(go.Bar(
    x=fisher["a_support"],
    y=fisher["pmf"],
    marker_color=colors,
))
fig.add_vline(x=int(table[0, 0]), line_dash="dash", line_color="black")

fig.update_layout(
    title=f"Fisher's exact test (two-sided p = {fisher['p_value']:.4f})",
    xaxis_title="a (top-left cell count)",
    yaxis_title="P(A=a | margins)",
    height=420,
)
fig.show()
Fisher two-sided p-value: 0.017149022768408462

5) Interpreting OR in terms of probabilities (why OR can mislead)#

Because OR multiplies odds, the same OR can correspond to different probability changes.

If baseline probability is \(p_0\) and the odds ratio is OR, then:

  • baseline odds: \( ext{odds}_0 = rac{p_0}{1-p_0}\)

  • new odds: \( ext{odds}_1 = ext{OR}\cdot ext{odds}_0\)

  • implied new probability:

\[p_1 = rac{ ext{OR}\,p_0}{1 - p_0 + ext{OR}\,p_0}\]

We’ll plot \(p_1\) vs \(p_0\) for a few OR values, and also the risk ratio \(p_1/p_0\).

def p1_from_p0_and_or(p0: np.ndarray, odds_ratio: float) -> np.ndarray:
    p0 = np.asarray(p0, dtype=float)
    return (odds_ratio * p0) / (1.0 - p0 + odds_ratio * p0)


p0 = np.linspace(0.001, 0.999, 500)
ors = [0.5, 1.0, 2.0, 5.0]

fig = make_subplots(
    rows=1,
    cols=2,
    subplot_titles=("Implied p1 for a fixed OR", "Risk ratio p1/p0 for a fixed OR"),
)

for OR in ors:
    p1 = p1_from_p0_and_or(p0, OR)
    rr = p1 / p0

    fig.add_trace(go.Scatter(x=p0, y=p1, mode="lines", name=f"OR={OR}"), row=1, col=1)
    fig.add_trace(go.Scatter(x=p0, y=rr, mode="lines", name=f"OR={OR}", showlegend=False), row=1, col=2)

fig.update_xaxes(title_text="baseline probability p0", row=1, col=1)
fig.update_yaxes(title_text="implied probability p1", row=1, col=1)

fig.update_xaxes(title_text="baseline probability p0", row=1, col=2)
fig.update_yaxes(title_text="risk ratio = p1/p0", row=1, col=2)

fig.update_layout(title="Same OR, different probability changes", height=460)
fig.show()

6) How to interpret the test result#

What a p-value answers (in this context):

If there were truly no association between exposure and outcome (OR = 1), how unusual would it be to see a table at least this extreme (by the chosen test)?

  • Small p-value (e.g., < 0.05): evidence against OR = 1 (an association is plausible)

  • Large p-value: the data are consistent with OR = 1 (but this is not proof of no effect)

What the confidence interval answers:

A range of OR values compatible with the data (given the test’s assumptions).

If the CI excludes 1, it aligns with rejecting OR=1 at the corresponding level.

Always report the effect size (OR) and uncertainty (CI), not just the p-value.

7) Practical notes + pitfalls#

  • OR is symmetric: swapping rows or columns in the 2×2 table inverts the OR (interpret carefully).

  • Case-control studies often estimate OR because sampling is conditional on the outcome.

  • OR ≠ risk ratio unless the event is rare; for common outcomes OR can look “bigger” than the risk ratio.

  • Small counts / zeros: prefer Fisher’s exact test (or use continuity corrections / exact logistic regression).

  • Association ≠ causation: confounding can produce OR ≠ 1 even without a causal effect.

8) Exercises#

  1. Create a 2×2 table with a zero cell and compare OR and Wald p-values with correction=0 and correction=0.5.

  2. For a fixed OR=2, read off the implied \(p_1\) when \(p_0 \in \{0.01, 0.1, 0.5\}\). Why do the probability changes differ so much?

  3. Implement a function that takes raw individual-level data (two boolean arrays) and returns the 2×2 table, OR, and test results.

References#

  • Agresti, An Introduction to Categorical Data Analysis

  • Fisher’s exact test and the hypergeometric distribution (standard categorical data texts)

  • Relationship to logistic regression: for a binary predictor, the logistic regression coefficient equals \(\log( ext{OR})\).